SRCH-5154 Bulk delete zombie records from Elastic Search#1743
SRCH-5154 Bulk delete zombie records from Elastic Search#1743
Conversation
e1c91c7 to
30e403c
Compare
stevenbarragan
left a comment
There was a problem hiding this comment.
Plase make sure rspec, cucumber, and codeclimate checks pass.
I have to make one more change to delete the record with |
be72efe to
52117bf
Compare
stevenbarragan
left a comment
There was a problem hiding this comment.
@krbhavith don't forget to fix the test coverage
stevenbarragan
left a comment
There was a problem hiding this comment.
Please make sure file exists where the job will read it. And avoid using rescue as much.
| BulkZombieUrlUploaderJob.perform_later( | ||
| current_user, | ||
| @file.original_filename, | ||
| @file.tempfile.path |
There was a problem hiding this comment.
This approach won't work on staging or production. When the request comes in the tempfile gets created in the app servers but the workers will try to perform the upload from the crawl server.
This file needs to be send to S3 and the job needs to read it from there.
There was a problem hiding this comment.
Existing similar functionality at https://github.com/GSA/search-gov/blob/main/app/controllers/admin/bulk_url_upload_controller.rb#L33 working fine now in production
There was a problem hiding this comment.
How is the worker picking up the tempfile?
0d18682 to
c83b0ed
Compare
c83b0ed to
7b4929d
Compare
stevenbarragan
left a comment
There was a problem hiding this comment.
We should not need to use.send to call methods plase change that. There is a few places where logging can be improved.
| BulkZombieUrlUploaderJob.perform_later( | ||
| current_user, | ||
| @file.original_filename, | ||
| @file.tempfile.path |
There was a problem hiding this comment.
How is the worker picking up the tempfile?
| let(:row) { { 'URL' => 'https://example.com', 'DOC_ID' => nil } } | ||
|
|
||
| it 'logs a missing document ID error' do | ||
| uploader.send(:process_row, row) |
There was a problem hiding this comment.
Do not use send to call methods. If the method is private test it through the parent or make it public.
uploader.process_row, row
| context 'when document ID is present' do | ||
| it 'handles URL processing' do | ||
| allow(uploader).to receive(:handle_url_processing) | ||
| uploader.send(:process_row, row) |
There was a problem hiding this comment.
is there a need to use send?
|
|
||
| describe '#handle_processing_error' do | ||
| subject(:handle_processing_error) do | ||
| uploader.send(:handle_processing_error, error, url, document_id, row) |
|
|
||
| describe '#initialize_results' do | ||
| it 'initializes the results object' do | ||
| uploader.send(:initialize_results) |
There was a problem hiding this comment.
We should not use send. private methods do no need to be unit tested. The uploadmethod in here should have tests for whatinitialize_results` is doing instead.
| # frozen_string_literal: true | ||
|
|
||
| class BulkZombieUrls::FileValidator | ||
| MAXIMUM_FILE_SIZE = 4.megabytes |
There was a problem hiding this comment.
Where this this restriction comes from? I don't see it as part of the ticket.
There was a problem hiding this comment.
It's not the requirement from the ticket, but I am following this as per the previous other upload processes.
Summary
SearchgovUrltable and from Kibana.BulkUploadHandlerto be used by all bulk upload controllers.Checklist
Please ensure you have addressed all concerns below before marking a PR "ready for review" or before requesting a re-review. If you cannot complete an item below, replace the checkbox with the⚠️
:warning:emoji and explain why the step was not completed.Functionality Checks
You have merged the latest changes from the target branch (usually
main) into your branch.Your primary commit message is of the format SRCH-#### <description> matching the associated Jira ticket.
PR title is either of the format SRCH-#### <description> matching the associated Jira ticket (i.e. "SRCH-123 implement feature X"), or Release - SRCH-####, SRCH-####, SRCH-#### matching the Jira ticket numbers in the release.
Automated checks pass. If Code Climate checks do not pass, explain reason for failures:
Process Checks